Efficiently Tolerating Failures in Asynchronous Real-Time Distributed Systems
نویسندگان
چکیده
We present a proactive resource allocation algorithm, called BEA, for faulttolerant asynchronous real-time distributed systems. BEA considers an application model where trans-node application timeliness requirements are expressed using benefit functions, and anticipated workload during future time intervals are expressed using adaptation functions. Furthermore, BEA considers an adaptation model where subtasks of application tasks are replicated at run-time for tolerating failures as well as for sharing workload increases. Given such models, the objective of the algorithm is to maximize the aggregate real-time benefit and the ability to tolerate host failures during the time window of adaptation functions. Since determining the optimal solution is computationally intractable, BEA heuristically computes near-optimal resource allocations in polynomial-time. We show that BEA can achieve almost the same fault-tolerance ability as full replication, and accrue most of real-time benefit that full replication can accrue. In the meanwhile, BEA requires much fewer replicas than full replication, and hence is cost effective.
منابع مشابه
On Tolerating Failures of Mobile Hosts and Mobile Support Stations
In this paper, we present two fault-tolerant protocols for mobile computing systems; a causal message logging protocol and a receiver-based pessimistic message logging protocol for tolerating failures of mobile hosts (MHs) and mobile support stations (MSSs) respectively. The systems raise several constraints such as limited life of battery power, mobility and disconnection of hosts and lack of ...
متن کاملAgreeing on Processor Group Membership in Timed Asynchronous Distributed Systems
We introduce the timed asynchronous distributed system model to describe existing asynchronous distributed systems subject to unbounded processing and communication delays, failures and recoveries. We then describe ve increasingly strong speci cations for processor-group membership services in timed asynchronous systems subject to partitioning. We also propose ve distributed protocols that impl...
متن کاملNew Causal Message Logging Protocol with Asynchronous Checkpointing for Distributed Systems
Causal message logging is an efficient approach for tolerating failures of processes in distributed systems because it has the advantages of both pessimistic and optimistic message logging approach. However, traditional causal message logging protocols prevent live processes from executing continuously their computation and require some synchronous logging to the stable storage during recovery....
متن کاملProactive QoS negotiation in asynchronous real-time distributed systems
We present a fast, proactive, quality of service (QoS) negotiation algorithm called Best Effort Negotiation (or BEN), for asynchronous real-time distributed systems. BEN considers an application model where trans-node application timeliness and faulttolerance requirements are expressed using benefit functions, and anticipated workload and system failure rates during future time intervals are ex...
متن کاملSubtleties in Tolerating Correlated Failures
High availability is widely accepted as an explicit requirement for distributed storage systems. Tolerating correlated failures is a key issue in achieving high availability in today’s wide-area environments. This paper systematically revisits previously proposed techniques for addressing correlated failures. Using a combination of experimental and mathematical analysis of several real-world fa...
متن کامل